[IR] Add llvm `clmul` intrinsic #140301

oscardssmith · 2025-05-16T19:25:47Z

This is the generic version of int_x86_pclmulqdq, and riscv_clmul as discussed in https://discourse.llvm.org/t/rfc-carry-less-multiplication-instruction/55819/26, (and will allow implementations for powerpc and aarch64 backends that have this instruction but no backend intrinsic).

So far I have only hooked this up for the RISCV backend, but the x86 backend should be pretty easy as well.

This is my first LLVM PR, so please tell me everything that I've messed up.

github-actions · 2025-05-16T19:26:08Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-05-16T19:26:41Z

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-llvm-ir

Author: Oscar Smith (oscardssmith)

Changes

This is the generic version of int_x86_pclmulqdq, and riscv_clmul as discussed in https://discourse.llvm.org/t/rfc-carry-less-multiplication-instruction/55819/26, (and will allow implementations for powerpc and aarch64 backends that have this instruction but no backend intrinsic).

So far I have only hooked this up for the RISCV backend, but the x86 backend should be pretty easy as well.

This is my first LLVM PR, so please tell me everything that I've messed up.

Full diff: https://github.com/llvm/llvm-project/pull/140301.diff

4 Files Affected:

(modified) llvm/docs/LangRef.rst (+65-17)
(modified) llvm/include/llvm/IR/Intrinsics.td (+8)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+2)
(modified) llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll (+3-3)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index a1ae6611acd3c..636f18f28610b 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -10471,8 +10471,8 @@ its two operands.
 
 .. note::
 
-	The instruction is implemented as a call to libm's '``fmod``'
-	for some targets, and using the instruction may thus require linking libm.
+    The instruction is implemented as a call to libm's '``fmod``'
+    for some targets, and using the instruction may thus require linking libm.
 
 
 Arguments:
@@ -18055,6 +18055,54 @@ Example:
       %r = call i8 @llvm.fshr.i8(i8 15, i8 15, i8 11)  ; %r = i8: 225 (0b11100001)
       %r = call i8 @llvm.fshr.i8(i8 0, i8 255, i8 8)   ; %r = i8: 255 (0b11111111)
 
+.. clmul:
+
+'``clmul.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.clmul``
+on any integer bit width or vectors of integers.
+
+::
+
+      declare i16 @llvm.clmul.i16(i16 %a, i16 %b)
+      declare i32 @llvm.clmul.i32(i32 %a, i32 %b)
+      declare i64 @llvm.clmul.i64(i64 %a, i64 %b)
+      declare <4 x i32> @llvm.clmult.v4i32(<4 x i32> %a, <4 x i32> %b)
+
+Overview
+"""""""""
+
+The '``llvm.clmul``' family of intrinsics functions perform carryless multiplication
+(also known as xor multiplication) on the 2 arguments.
+
+Arguments
+""""""""""
+
+The arguments (%a and %b) and the result may be of integer types of any bit
+width, but they must have the same bit width. ``%a`` and ``%b`` are the two
+values that will undergo carryless multiplication.
+
+Semantics:
+""""""""""
+
+The ‘llvm.clmul’ intrinsic computes carryless multiply of ``%a`` and ``%b``, which is the result
+of applying the standard multiplication algorithm if you replace all of the aditions with exclusive ors.
+The vector intrinsics, such as llvm.clmul.v4i32, operate on a per-element basis and the element order is not affected.
+
+Examples
+"""""""""
+
+.. code-block:: llvm
+
+      %res = call i4 @llvm.clmul.i4(i4 1, i4 2)  ; %res = 2
+      %res = call i4 @llvm.clmul.i4(i4 5, i4 6)  ; %res = 14
+      %res = call i4 @llvm.clmul.i4(i4 -4, i4 2)  ; %res = -8
+      %res = call i4 @llvm.clmul.i4(i4 -4, i4 -5)  ; %res = -12
+
 Arithmetic with Overflow Intrinsics
 -----------------------------------
 
@@ -24244,14 +24292,14 @@ Examples:
 
 .. code-block:: text
 
-	 %r = call <8 x i64> @llvm.experimental.vp.strided.load.v8i64.i64(i64* %ptr, i64 %stride, <8 x i64> %mask, i32 %evl)
-	 ;; The operation can also be expressed like this:
+     %r = call <8 x i64> @llvm.experimental.vp.strided.load.v8i64.i64(i64* %ptr, i64 %stride, <8 x i64> %mask, i32 %evl)
+     ;; The operation can also be expressed like this:
 
-	 %addr = bitcast i64* %ptr to i8*
-	 ;; Create a vector of pointers %addrs in the form:
-	 ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
-	 %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
-	 %also.r = call <8 x i64> @llvm.vp.gather.v8i64.v8p0i64(<8 x i64* > %ptrs, <8 x i64> %mask, i32 %evl)
+     %addr = bitcast i64* %ptr to i8*
+     ;; Create a vector of pointers %addrs in the form:
+     ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
+     %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
+     %also.r = call <8 x i64> @llvm.vp.gather.v8i64.v8p0i64(<8 x i64* > %ptrs, <8 x i64> %mask, i32 %evl)
 
 
 .. _int_experimental_vp_strided_store:
@@ -24295,7 +24343,7 @@ The '``llvm.experimental.vp.strided.store``' intrinsic stores the elements of
 '``val``' in the same way as the :ref:`llvm.vp.scatter <int_vp_scatter>` intrinsic,
 where the vector of pointers is in the form:
 
-	``%ptrs = <%ptr, %ptr + %stride, %ptr + 2 * %stride, ... >``,
+    ``%ptrs = <%ptr, %ptr + %stride, %ptr + 2 * %stride, ... >``,
 
 with '``ptr``' previously casted to a pointer '``i8``', '``stride``' always interpreted as a signed
 integer and all arithmetic occurring in the pointer type.
@@ -24305,14 +24353,14 @@ Examples:
 
 .. code-block:: text
 
-	 call void @llvm.experimental.vp.strided.store.v8i64.i64(<8 x i64> %val, i64* %ptr, i64 %stride, <8 x i1> %mask, i32 %evl)
-	 ;; The operation can also be expressed like this:
+     call void @llvm.experimental.vp.strided.store.v8i64.i64(<8 x i64> %val, i64* %ptr, i64 %stride, <8 x i1> %mask, i32 %evl)
+     ;; The operation can also be expressed like this:
 
-	 %addr = bitcast i64* %ptr to i8*
-	 ;; Create a vector of pointers %addrs in the form:
-	 ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
-	 %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
-	 call void @llvm.vp.scatter.v8i64.v8p0i64(<8 x i64> %val, <8 x i64*> %ptrs, <8 x i1> %mask, i32 %evl)
+     %addr = bitcast i64* %ptr to i8*
+     ;; Create a vector of pointers %addrs in the form:
+     ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
+     %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
+     call void @llvm.vp.scatter.v8i64.v8p0i64(<8 x i64> %val, <8 x i64*> %ptrs, <8 x i1> %mask, i32 %evl)
 
 
 .. _int_vp_gather:
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index e1a135a5ad48e..1857829910340 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1431,6 +1431,8 @@ let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn] in {
       [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
   def int_fshr : DefaultAttrsIntrinsic<[llvm_anyint_ty],
       [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
+  def int_clmul : DefaultAttrsIntrinsic<[llvm_anyint_ty],
+      [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
 }
 
 let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn,
@@ -2103,6 +2105,12 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn] in {
                                LLVMMatchType<0>,
                                LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
                                llvm_i32_ty]>;
+  def int_vp_clmul : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
+                             [ LLVMMatchType<0>,
+                               LLVMMatchType<0>,
+                               LLVMMatchType<0>,
+                               LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
+                               llvm_i32_ty]>;
   def int_vp_sadd_sat : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
                              [ LLVMMatchType<0>,
                                LLVMMatchType<0>,
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index fae2cda13863d..6167c375755fd 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -10348,6 +10348,7 @@ SDValue RISCVTargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     return DAG.getNode(RISCVISD::MOPRR, DL, XLenVT, Op.getOperand(1),
                        Op.getOperand(2), Op.getOperand(3));
   }
+  case Intrinsic::clmul:
   case Intrinsic::riscv_clmul:
     return DAG.getNode(RISCVISD::CLMUL, DL, XLenVT, Op.getOperand(1),
                        Op.getOperand(2));
@@ -14284,6 +14285,7 @@ void RISCVTargetLowering::ReplaceNodeResults(SDNode *N,
       Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, MVT::i32, Res));
       return;
     }
+    case Intrinsic::clmul:
     case Intrinsic::riscv_clmul: {
       if (!Subtarget.is64Bit() || N->getValueType(0) != MVT::i32)
         return;
diff --git a/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll b/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
index aa9e89bc20953..5017f9f4853b5 100644
--- a/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
+++ b/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
@@ -4,7 +4,7 @@
 ; RUN: llc -mtriple=riscv64 -mattr=+zbkc -verify-machineinstrs < %s \
 ; RUN:   | FileCheck %s -check-prefix=RV64ZBC-ZBKC
 
-declare i64 @llvm.riscv.clmul.i64(i64 %a, i64 %b)
+declare i64 @llvm.clmul.i64(i64 %a, i64 %b)
 
 define i64 @clmul64(i64 %a, i64 %b) nounwind {
 ; RV64ZBC-ZBKC-LABEL: clmul64:
@@ -26,7 +26,7 @@ define i64 @clmul64h(i64 %a, i64 %b) nounwind {
   ret i64 %tmp
 }
 
-declare i32 @llvm.riscv.clmul.i32(i32 %a, i32 %b)
+declare i32 @llvm.clmul.i32(i32 %a, i32 %b)
 
 define signext i32 @clmul32(i32 signext %a, i32 signext %b) nounwind {
 ; RV64ZBC-ZBKC-LABEL: clmul32:
@@ -34,7 +34,7 @@ define signext i32 @clmul32(i32 signext %a, i32 signext %b) nounwind {
 ; RV64ZBC-ZBKC-NEXT:    clmul a0, a0, a1
 ; RV64ZBC-ZBKC-NEXT:    sext.w a0, a0
 ; RV64ZBC-ZBKC-NEXT:    ret
-  %tmp = call i32 @llvm.riscv.clmul.i32(i32 %a, i32 %b)
+  %tmp = call i32 @llvm.clmul.i32(i32 %a, i32 %b)
   ret i32 %tmp
 }

llvmbot · 2025-05-16T19:26:41Z

@llvm/pr-subscribers-backend-risc-v

Author: Oscar Smith (oscardssmith)

Changes

This is the generic version of int_x86_pclmulqdq, and riscv_clmul as discussed in https://discourse.llvm.org/t/rfc-carry-less-multiplication-instruction/55819/26, (and will allow implementations for powerpc and aarch64 backends that have this instruction but no backend intrinsic).

So far I have only hooked this up for the RISCV backend, but the x86 backend should be pretty easy as well.

This is my first LLVM PR, so please tell me everything that I've messed up.

Full diff: https://github.com/llvm/llvm-project/pull/140301.diff

4 Files Affected:

(modified) llvm/docs/LangRef.rst (+65-17)
(modified) llvm/include/llvm/IR/Intrinsics.td (+8)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+2)
(modified) llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll (+3-3)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index a1ae6611acd3c..636f18f28610b 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -10471,8 +10471,8 @@ its two operands.
 
 .. note::
 
-	The instruction is implemented as a call to libm's '``fmod``'
-	for some targets, and using the instruction may thus require linking libm.
+    The instruction is implemented as a call to libm's '``fmod``'
+    for some targets, and using the instruction may thus require linking libm.
 
 
 Arguments:
@@ -18055,6 +18055,54 @@ Example:
       %r = call i8 @llvm.fshr.i8(i8 15, i8 15, i8 11)  ; %r = i8: 225 (0b11100001)
       %r = call i8 @llvm.fshr.i8(i8 0, i8 255, i8 8)   ; %r = i8: 255 (0b11111111)
 
+.. clmul:
+
+'``clmul.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.clmul``
+on any integer bit width or vectors of integers.
+
+::
+
+      declare i16 @llvm.clmul.i16(i16 %a, i16 %b)
+      declare i32 @llvm.clmul.i32(i32 %a, i32 %b)
+      declare i64 @llvm.clmul.i64(i64 %a, i64 %b)
+      declare <4 x i32> @llvm.clmult.v4i32(<4 x i32> %a, <4 x i32> %b)
+
+Overview
+"""""""""
+
+The '``llvm.clmul``' family of intrinsics functions perform carryless multiplication
+(also known as xor multiplication) on the 2 arguments.
+
+Arguments
+""""""""""
+
+The arguments (%a and %b) and the result may be of integer types of any bit
+width, but they must have the same bit width. ``%a`` and ``%b`` are the two
+values that will undergo carryless multiplication.
+
+Semantics:
+""""""""""
+
+The ‘llvm.clmul’ intrinsic computes carryless multiply of ``%a`` and ``%b``, which is the result
+of applying the standard multiplication algorithm if you replace all of the aditions with exclusive ors.
+The vector intrinsics, such as llvm.clmul.v4i32, operate on a per-element basis and the element order is not affected.
+
+Examples
+"""""""""
+
+.. code-block:: llvm
+
+      %res = call i4 @llvm.clmul.i4(i4 1, i4 2)  ; %res = 2
+      %res = call i4 @llvm.clmul.i4(i4 5, i4 6)  ; %res = 14
+      %res = call i4 @llvm.clmul.i4(i4 -4, i4 2)  ; %res = -8
+      %res = call i4 @llvm.clmul.i4(i4 -4, i4 -5)  ; %res = -12
+
 Arithmetic with Overflow Intrinsics
 -----------------------------------
 
@@ -24244,14 +24292,14 @@ Examples:
 
 .. code-block:: text
 
-	 %r = call <8 x i64> @llvm.experimental.vp.strided.load.v8i64.i64(i64* %ptr, i64 %stride, <8 x i64> %mask, i32 %evl)
-	 ;; The operation can also be expressed like this:
+     %r = call <8 x i64> @llvm.experimental.vp.strided.load.v8i64.i64(i64* %ptr, i64 %stride, <8 x i64> %mask, i32 %evl)
+     ;; The operation can also be expressed like this:
 
-	 %addr = bitcast i64* %ptr to i8*
-	 ;; Create a vector of pointers %addrs in the form:
-	 ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
-	 %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
-	 %also.r = call <8 x i64> @llvm.vp.gather.v8i64.v8p0i64(<8 x i64* > %ptrs, <8 x i64> %mask, i32 %evl)
+     %addr = bitcast i64* %ptr to i8*
+     ;; Create a vector of pointers %addrs in the form:
+     ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
+     %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
+     %also.r = call <8 x i64> @llvm.vp.gather.v8i64.v8p0i64(<8 x i64* > %ptrs, <8 x i64> %mask, i32 %evl)
 
 
 .. _int_experimental_vp_strided_store:
@@ -24295,7 +24343,7 @@ The '``llvm.experimental.vp.strided.store``' intrinsic stores the elements of
 '``val``' in the same way as the :ref:`llvm.vp.scatter <int_vp_scatter>` intrinsic,
 where the vector of pointers is in the form:
 
-	``%ptrs = <%ptr, %ptr + %stride, %ptr + 2 * %stride, ... >``,
+    ``%ptrs = <%ptr, %ptr + %stride, %ptr + 2 * %stride, ... >``,
 
 with '``ptr``' previously casted to a pointer '``i8``', '``stride``' always interpreted as a signed
 integer and all arithmetic occurring in the pointer type.
@@ -24305,14 +24353,14 @@ Examples:
 
 .. code-block:: text
 
-	 call void @llvm.experimental.vp.strided.store.v8i64.i64(<8 x i64> %val, i64* %ptr, i64 %stride, <8 x i1> %mask, i32 %evl)
-	 ;; The operation can also be expressed like this:
+     call void @llvm.experimental.vp.strided.store.v8i64.i64(<8 x i64> %val, i64* %ptr, i64 %stride, <8 x i1> %mask, i32 %evl)
+     ;; The operation can also be expressed like this:
 
-	 %addr = bitcast i64* %ptr to i8*
-	 ;; Create a vector of pointers %addrs in the form:
-	 ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
-	 %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
-	 call void @llvm.vp.scatter.v8i64.v8p0i64(<8 x i64> %val, <8 x i64*> %ptrs, <8 x i1> %mask, i32 %evl)
+     %addr = bitcast i64* %ptr to i8*
+     ;; Create a vector of pointers %addrs in the form:
+     ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
+     %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
+     call void @llvm.vp.scatter.v8i64.v8p0i64(<8 x i64> %val, <8 x i64*> %ptrs, <8 x i1> %mask, i32 %evl)
 
 
 .. _int_vp_gather:
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index e1a135a5ad48e..1857829910340 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1431,6 +1431,8 @@ let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn] in {
       [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
   def int_fshr : DefaultAttrsIntrinsic<[llvm_anyint_ty],
       [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
+  def int_clmul : DefaultAttrsIntrinsic<[llvm_anyint_ty],
+      [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
 }
 
 let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn,
@@ -2103,6 +2105,12 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn] in {
                                LLVMMatchType<0>,
                                LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
                                llvm_i32_ty]>;
+  def int_vp_clmul : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
+                             [ LLVMMatchType<0>,
+                               LLVMMatchType<0>,
+                               LLVMMatchType<0>,
+                               LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
+                               llvm_i32_ty]>;
   def int_vp_sadd_sat : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
                              [ LLVMMatchType<0>,
                                LLVMMatchType<0>,
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index fae2cda13863d..6167c375755fd 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -10348,6 +10348,7 @@ SDValue RISCVTargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     return DAG.getNode(RISCVISD::MOPRR, DL, XLenVT, Op.getOperand(1),
                        Op.getOperand(2), Op.getOperand(3));
   }
+  case Intrinsic::clmul:
   case Intrinsic::riscv_clmul:
     return DAG.getNode(RISCVISD::CLMUL, DL, XLenVT, Op.getOperand(1),
                        Op.getOperand(2));
@@ -14284,6 +14285,7 @@ void RISCVTargetLowering::ReplaceNodeResults(SDNode *N,
       Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, MVT::i32, Res));
       return;
     }
+    case Intrinsic::clmul:
     case Intrinsic::riscv_clmul: {
       if (!Subtarget.is64Bit() || N->getValueType(0) != MVT::i32)
         return;
diff --git a/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll b/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
index aa9e89bc20953..5017f9f4853b5 100644
--- a/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
+++ b/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
@@ -4,7 +4,7 @@
 ; RUN: llc -mtriple=riscv64 -mattr=+zbkc -verify-machineinstrs < %s \
 ; RUN:   | FileCheck %s -check-prefix=RV64ZBC-ZBKC
 
-declare i64 @llvm.riscv.clmul.i64(i64 %a, i64 %b)
+declare i64 @llvm.clmul.i64(i64 %a, i64 %b)
 
 define i64 @clmul64(i64 %a, i64 %b) nounwind {
 ; RV64ZBC-ZBKC-LABEL: clmul64:
@@ -26,7 +26,7 @@ define i64 @clmul64h(i64 %a, i64 %b) nounwind {
   ret i64 %tmp
 }
 
-declare i32 @llvm.riscv.clmul.i32(i32 %a, i32 %b)
+declare i32 @llvm.clmul.i32(i32 %a, i32 %b)
 
 define signext i32 @clmul32(i32 signext %a, i32 signext %b) nounwind {
 ; RV64ZBC-ZBKC-LABEL: clmul32:
@@ -34,7 +34,7 @@ define signext i32 @clmul32(i32 signext %a, i32 signext %b) nounwind {
 ; RV64ZBC-ZBKC-NEXT:    clmul a0, a0, a1
 ; RV64ZBC-ZBKC-NEXT:    sext.w a0, a0
 ; RV64ZBC-ZBKC-NEXT:    ret
-  %tmp = call i32 @llvm.riscv.clmul.i32(i32 %a, i32 %b)
+  %tmp = call i32 @llvm.clmul.i32(i32 %a, i32 %b)
   ret i32 %tmp
 }

llvm/docs/LangRef.rst

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll

jayfoad · 2025-05-18T08:47:52Z

This is my first LLVM PR, so please tell me everything that I've messed up.

The title :) Should be "clmul" not "cmul".

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/IntrinsicLowering.cpp

llvm/docs/LangRef.rst

llvm/lib/CodeGen/IntrinsicLowering.cpp

topperc · 2025-05-20T00:20:05Z

We need to make ISD::CLMUL Expand by default for all data types in TargetLoweringBase::initActions()

See existing examples like

    // [US]CMP default to expand                                                 
    setOperationAction({ISD::UCMP, ISD::SCMP}, VT, Expand);                      
                                                                                 
    // Halving adds                                                              
    setOperationAction(                                                          
        {ISD::AVGFLOORS, ISD::AVGFLOORU, ISD::AVGCEILS, ISD::AVGCEILU}, VT,      
        Expand);                                                                 
                                                                                 
    // Absolute difference                                                       
    setOperationAction({ISD::ABDS, ISD::ABDU}, VT, Expand);                      
                                                                                 
    // Saturated trunc                                                           
    setOperationAction(ISD::TRUNCATE_SSAT_S, VT, Expand);                        
    setOperationAction(ISD::TRUNCATE_SSAT_U, VT, Expand);                        
    setOperationAction(ISD::TRUNCATE_USAT_U, VT, Expand);

Then, for RISC-V we need to make it Legal for XLenVT when CLMUL instruction is supported using setOperationAction in the RISCVTargetLowering constructor.

oscardssmith · 2025-05-20T04:11:22Z

@topperc looking at TargetLoweringBase::initActions, given that basically everything there listed explicitly wants to Expand, why aren't they all just in the big list that starts with FGETSIGN? Is there a reason some are separated out?

llvm/docs/LangRef.rst

topperc · 2025-05-20T17:13:29Z

@topperc looking at TargetLoweringBase::initActions, given that basically everything there listed explicitly wants to Expand, why aren't they all just in the big list that starts with FGETSIGN? Is there a reason some are separated out?

I don't know.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/IntrinsicLowering.cpp

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

artagnon · 2025-09-09T17:10:41Z

Reverse ping. Are you still pursuing this patch? I can offer to take over the PR, and get it over the finish line.

oscardssmith · 2025-09-09T17:25:38Z

I'd love the help finishing it up! I like this PR but don't actually need it for anything, so I'm unlikely to have time in the near term to finish it up. I think the only thing left is tests, but there's always possibilities that there are bugs that the tests find. (an x86 implementation would also be good, but IMO that should probably wait for a follow up PR),

artagnon · 2025-10-10T10:29:35Z

On second thought, a generic llvm.clmul might not be such a good idea: the intended usecase is in CRC optimization, and a generic lowering would perform worse than a Sarwate table-lookup, so I'm not sure what value a generic lowering would add.

EDIT: Actually, we can have something like PopcntSupportKind, and conditionally emit clmul when it's fast on hardware.

oscardssmith · 2025-10-10T13:31:41Z

@artagnon IMO this is useful for more than CRC optimization. My original reason for wanting this was wanting to play around with clmul as a component of fast integer hash functions, (specifically I wanted a hash that distributes over the cartesian product and clmul makes for a very fast version of this hash).

More generally, if there is a CPU instruction across x86, arm, and riscv, that seems like a pretty good argument for having an LLVM intrinsic rather than 3 separate processor specific intrinsics.

eisenwave · 2025-11-14T10:16:17Z

@oscardssmith it may be worth mentioning that I have a proposal which is very likely going to be in C++29 and which adds a std::clmul function to the standard library: cplusplus/papers#2279

The paper also presents a bunch of further motivation and applications that may not have been discussed here yet.

Being able to lower this like std::clmul -> __builtin_clmul -> @llvm.clmul would certainly be helpful.

eisenwave · 2025-11-14T10:21:37Z

On second thought, a generic llvm.clmul might not be such a good idea: the intended usecase is in CRC optimization, and a generic lowering would perform worse than a Sarwate table-lookup, so I'm not sure what value a generic lowering would add.

@artagnon The generic lowering would certainly be helpful if a std::clmul function should be implemented in terms of some intrinsic.

Not sure how much weight that consideration has here though; I suppose it's always possible to conditionally provide a library fallback when the builtin is unavailable.

eisenwave · 2025-11-14T10:34:28Z

Another consideration is that the optimizer can detect when one of the operand is constant. For example, @llvm.clmul i32 x, 0b001001 can lower to two bit-shifts and XORs basically, even when there is no hardware support.

Reliably getting the same kind of optimization out of loop unrolling and constant folding in a software implementation of std::clmul seems a bit wishful; this can only be done in intrinsics.

artagnon · 2025-11-14T11:08:55Z

it may be worth mentioning that I have a proposal which is very likely going to be in C++29 and which adds a std::clmul function to the standard library: cplusplus/papers#2279

Thanks! I'll try to prioritize this task.

oscardssmith · 2025-11-14T13:23:45Z

@eisenwave thanks for bringing this up! My motiviation here was adding clmul to Julia, so it's good to see that other languages agree that this is a useful function to have.

artagnon · 2025-11-18T12:38:55Z

First step: #168527.

artagnon · 2025-11-19T16:39:19Z

Second step: #168731.

oscardssmith · 2025-11-19T21:06:56Z

Thanks so much for shepherding this through and actually getting it merged!

llvmbot added backend:RISC-V llvm:ir labels May 16, 2025

topperc reviewed May 16, 2025

View reviewed changes

llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label May 18, 2025

jayfoad reviewed May 18, 2025

View reviewed changes

llvm/docs/LangRef.rst Outdated Show resolved Hide resolved

llvm/include/llvm/CodeGen/ISDOpcodes.h Outdated Show resolved Hide resolved

llvm/include/llvm/IR/Intrinsics.td Outdated Show resolved Hide resolved

llvm/lib/CodeGen/IntrinsicLowering.cpp Outdated Show resolved Hide resolved

oscardssmith changed the title ~~[IR] Add llvm cmul intrinsic~~ [IR] Add llvm clmul intrinsic May 18, 2025

oscardssmith commented May 18, 2025

View reviewed changes